BMC Genomics — Latest Matching Preprints

1

A tissue-resolved transcriptomic atlas of adult male Halyomorpha halys reveals tissue-specific RNAi machinery and a minimal systemic response to non-specific dsRNA

Amineni, V. P. S.; Ramapuram, S.; Panfilio, K. A.

2026-05-29 genomics 10.64898/2026.05.26.728018 medRxiv

Top 0.1%

19.7%

Show abstract

BackgroundHalyomorpha halys (brown marmorated stink bug) is an invasive polyphagous pest causing significant agricultural damage worldwide and is an emerging target for RNAi-based pest management. Despite growing interest in dsRNA-based biocontrol, progress is constrained by the lack of tissue-resolved transcriptomic resources covering key biological processes such as feeding, detoxification, and reproduction. Furthermore, our understanding of how RNAi machinery expression varies across tissues remains limited, which impairs both target gene selection and predictions of RNAi efficacy. Critically, the transcriptional response of H. halys to haemolymph-delivered non-specific dsRNA represents a key knowledge gap for evaluating potential non-target immune reactions of dsRNA-based approaches. ResultsField-collected adult males were injected with either nuclease-free water or dsRNA targeting GFP (dsGFP), and transcriptomes were generated from the brain, midgut, salivary glands, and testes. Sequencing produced high-quality datasets with clear tissue-level separation and tight clustering of biological replicates. As expected in targeting a non-endogenous gene, differential expression analysis revealed a limited transcriptional response to dsGFP. Baseline profiling of RNAi pathway genes in controls showed broad expression of core siRNA and miRNA components across all tissues, yet with marked specialisation: two additional Argonaute-2 isoforms and multiple piRNA factors were testes-specific, whereas salivary glands showed strong, restricted expression of nuclease-encoding genes, including a T2 ribonuclease and a non-specific endonuclease. Expression atlases also revealed pronounced tissue partitioning for other protein families. Consistent with their respective functions, secreted trypsins and chymotrypsins are salivary-enriched while the cathepsins for intracellular protein catabolism are midgut-enriched, with brain-centred neuropeptide expression. However, we also uncovered unexpected nuance, such as closely related subfamilies of Cytochrome P450s, which generally function as detoxification enzymes, being partitioned between the midgut, brain or testes. ConclusionsThis work delivers the first tissue-resolved transcriptomic atlas of adult male H. halys, providing a high-resolution resource on compartmentalization of proteolysis, detoxification, and neuroendocrine signalling, as well as for candidate gene discovery in RNAi-based pest control. The modest, tissue-restricted transcriptional response to non-specific dsRNA, together with strong tissue-specific enrichment of some components, offers mechanistic insight into tissue-dependent RNAi efficiency and supports rational dsRNA target selection in H. halys.

2

A chromatin accessibility map of pea aphid brain and embryo identifies tissue-specific regulatory elements

Liu, X.; Brisson, J. A.

2026-05-15 genomics 10.64898/2026.05.14.725175 medRxiv

Top 0.1%

14.3%

Show abstract

The pea aphid (Acyrthosiphon pisum) is an important model organism for studying complex biological traits, including wing polyphenism and host-symbiont interactions, yet its regulatory genomic landscape remains largely uncharacterized. Here we present the first genome-wide chromatin accessibility map of the pea aphid, generated using the assay for transposase-accessible chromatin followed by sequencing (ATAC-seq). We profiled open chromatin regions (OCRs) in adult brains and late-stage embryos from winged and wingless morphs maintained under solitary or crowded conditions. We also paired ATAC-seq with RNA-seq in embryonic samples to examine the relationship between chromatin accessibility and gene expression. Libraries showed a high abundance of reads from the aphid endosymbionts Spiroplasma and Buchnera, reflecting preferential Tn5 transposase insertion into nucleosome-free bacterial DNA. After computational removal of these reads, the remaining aphid-mapping libraries displayed hallmarks of high-quality ATAC-seq data. We identified a consensus set of 37,127 OCRs enriched at promoters and distal regulatory elements, with substantial overlap with computationally predicted enhancers and enrichment for transcription factor binding motifs. Tissue identity was the dominant driver of chromatin variation, accounting for 85% of variance along the first principal component, with 19,513 differentially accessible regions distinguishing brain from embryo samples. By contrast, differences associated with wing morph or crowding treatment were modest. Promoter accessibility was significantly and positively correlated with gene expression genome-wide. Together, these data constitute a foundational regulatory genomics resource for the pea aphid and establish a framework for mechanistic studies of gene regulation in this ecologically and economically important insect.

3

Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

Marone, M. P.; Chen, E.; Himmelbach, A.; Haberer, G.; Spannagl, M.; Stein, N.; Mascher, M.

2026-05-18 genomics 10.64898/2026.05.13.724976 medRxiv

Top 0.1%

12.1%

Show abstract

BackgroundAs pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. ResultsWe applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. ConclusionsBy transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.

4

A Putative Single-Locus Determinant of the Suppressed In Ovo Virus Infection (SOV) Trait in Apis mellifera

Lefebre, R.; Broeckx, B. J. G.; De Smet, L.; Braeckman, M.; Gregorc, A.; Peelman, L.; de Graaf, D. C.

2026-05-29 genomics 10.64898/2026.05.28.728461 medRxiv

Top 0.1%

10.3%

Show abstract

Today, the deformed wing virus (DWV) can be considered as one of the major causes of global elevated western honey bee colony losses (Apis mellifera). Virus transmission may occur horizontally between individuals of the same generation, but also vertically from parents to offspring. The recently defined heritable suppressed in ovo virus infection (SOV) trait describes the absence of viruses in pooled drone eggs of a queen, associated with significant lower DWV prevalence and viral loads in the subsequent developmental offspring stages. By definition, the trait reflects the absence of vertical virus transmission from SOV-positive (SOV+) queens themselves to their offspring. However, the genetic basis influencing this heritable virus resilience has not been identified yet. In this study, we aimed to identify SOV-associated genetic marker(s) or loci in the honey bee genome through genome-wide variant comparison of 44 DWV-positive and 44 DWV-negative drone pupae descendent from an artificially created hybrid SOV+/SOV- colony. After whole genome sequencing (WGS), variant calling, and genotype-phenotype association analysis by means of single marker tests and elastic net regression, one variant in a locus of 241.246 bp on chromosome 7 that contained 17 other highly SOV-associated variants classified 68,2% of the drone phenotypes correctly. These results may support the potential application of marker-assisted selection (MAS) strategies targeting reduced vertical virus transmission in honey bees.

5

A framework for identifying transcript orthologs: the evolution of sex bias in alternative transcript structure in Drosophila

.Bankole, K.; McIntyre, L.; Garan, M.; Morse, A. M.; Keil, N.; Hernandez, A.; Barmina, O.; Khan, M.; Kopp, A.; Rogers, R.; Graze, R. M.

2026-05-26 genomics 10.64898/2026.05.25.727716 medRxiv

Top 0.1%

9.9%

Show abstract

BackgroundRecent advances in long read technologies provide an unprecedented opportunity to study transcript evolution. However, comparative evolutionary studies, even in Drosophila, are limited by inconsistent and incomplete annotation, and the lack of annotated transcript homology. ResultsIn this study of five species spanning 28 million years (D. melanogaster, D. simulans, D. yakuba, D. santomea and D. serrata), we infer transcript homology using reciprocal liftover, and orthology using network analyses, with data validation from long read RNA-seq of male and female head tissue. We build the first genus level annotation, with 15,996 genes and 56,370 transcripts. Expressed transcripts are conserved, 73% of transcript orthologs are detected in all species. Even the improved annotation underestimates the number of genes with alternative transcripts, with 75% of genes expressing multiple structurally diverse transcripts. In a replicated quantitative evaluation of [~]10,000 genes, both male and female-biased transcripts are expressed in 410 (D. melanogaster), 608 (D. simulans), and 493 (D. serrata) genes and in 118 orthologous genes in the D. melanogaster - D. simulans species pair, indicating greater potential for resolution of sexual conflict by alternative transcription than previously appreciated. We identified 605 transcript orthologs conserved for sex bias in the D. melanogaster-D. simulans species pair and of these, 22 male and 19 female-biased transcripts were conserved in sex bias with the outgroup D. serrata, including transcripts of genes involved in brain development, Sxl target Glutamine synthetase 2 and ciboulot. ConclusionsConserved alternative transcripts suggest that transcriptional diversity is a pervasive driver of the evolution of functional diversity.

6

Directional Gene-Level Concordance and Methodological Constraints in Blood Transcriptomic and DNA Methylation Studies of Parkinson's Disease

Kaur, R.; Dewan, C.; Chauhan, I.; Sharma, K.; Sharma, S.

2026-05-20 neuroscience 10.64898/2026.05.17.725808 medRxiv

Top 0.1%

8.6%

Show abstract

Assessing reproducibility across different molecular profiling studies is a persistent methodological challenge (Zhang et al., 2009; Sweeney et al., 2017; Ioannidis, 2005). Differences in platform technology, cohort composition, analytical pipelines, and feature definitions often make it difficult to interpret cross-study comparisons based solely on gene-identity overlap. In this study, we conducted a retrospective computational analysis of seven publicly available analytical datasets (including alternative analytical pipelines applied to the same cohort) derived from five biologically independent peripheral blood transcriptomic and DNA methylation cohorts, comprising 3,487 samples (1,824 Parkinsons disease cases and 1,663 controls). Reproducibility was evaluated using gene-identity overlap, enrichment-based comparisons, and a permutation-based framework to assess directional consistency of effect estimates across datasets. We also tested the robustness of results by varying false discovery rate thresholds and applying alternative probe-to-gene collapsing strategies. All analyses were performed using reproducible workflows implemented in R and Python with fixed random seeds. Across independent cohorts, gene-identity overlap was generally limited, with enrichment ratios close to one, especially when datasets were generated using different platforms. In several datasets, limited numbers of statistically significant features further constrained overlap-based comparisons. In contrast, directional consistency showed greater stability. High levels of directional consistency were observed across independent cohort comparisons when restricted to overlapping statistically significant features and remained stable across statistical thresholds (90.0% at FDR < 0.05 and 82.8% at FDR < 0.10). When evaluated across the full shared gene universe without conditioning on statistical significance, directional consistency was substantially lower ([~]30 to 32%) but remained significantly above permutation-based null expectations. Permutation testing confirmed that the observed directional consistency exceeded what would be expected by chance. A combined analysis including methodological replicates (n [≥] 3 datasets) showed 98.3% directional consistency; however, this estimate includes non-independent analytical pipelines applied to the same cohort and reflects analytical stability rather than independent biological replication. Rather than introducing a new statistical method, this study examines how commonly used reproducibility metrics behave under crossstudy heterogeneity and identifies their practical limitations and appropriate use boundaries.

7

QTL spanning the TGF-β2 locus is associated with muscle fiber hypertrophy in rainbow trout

Raghu, A.; Raymo, G.; Ahmed, R.; Ali, A. R.; Leeds, T.; Salem, M.

2026-05-27 genomics 10.64898/2026.05.24.727516 medRxiv

Top 0.1%

8.5%

Show abstract

BackgroundSkeletal muscle growth is a key determinant of body size and market value in salmonid aquaculture, yet the mechanisms linking genomic variation to muscle fiber hypertrophy remain poorly resolved. Myofiber cross-sectional area (CSA) provides a quantitative cellular proxy for fiber size and a direct link to macroscopic growth traits. MethodsWe performed histological phenotyping of white skeletal muscle from rainbow trout (Oncorhynchus mykiss) representing divergent fillet-yield selection lines (ARS-FY-H and ARS-FY-L), quantifying mean myofiber CSA and fiber number using high-throughput image analysis. Genome-wide association analysis (GWAS) was conducted using low-pass whole-genome sequencing ([~]1x) with genotype imputation and functional variant annotation. RNA sequencing was performed on fish representing high and low CSA extremes to identify differentially expressed genes and enriched biological pathways. ResultsMean myofiber CSA was significantly associated with body weight, muscle weight, visceral weight, and body length (p < 0.05), while fiber count showed no significant association with most growth traits, implicating hypertrophy as the primary driver of muscle mass variation. GWAS identified a significant QTL spanning [~]4.76 Mb on chromosome 2 (117 significant SNPs; Bonferroni-adjusted P [≤] 0.05; {lambda} = 1.02). Associated variants were predominantly noncoding, enriched in intronic, intergenic, and enhancer-annotated regions. A high density of SNPs colocalized with the TGF-{beta}2 locus, overlapping strong and genic enhancer elements in white muscle. Transcriptomic comparisons revealed that high-CSA muscle showed elevated expression of genes related to contractile function, cytoskeletal organization, and translation, while low-CSA muscle exhibited upregulation of extracellular matrix and immune-related genes consistent with a tissue remodeling state. ConclusionsNoncoding regulatory variation within a significant QTL spanning the TGF-{beta}2 locus is associated with distinct transcriptional programs linked to muscle fiber hypertrophy in rainbow trout. By integrating genetic variation, chromatin-state annotation, and transcriptomic profiling, this study identifies candidate regulatory loci associated with variation in muscle cellularity and growth-related phenotypes in rainbow trout.

8

Multisite Evaluation of an Amplification-based Nanopore Sequencing Solution to Analyze Challenging Clinically Relevant Variants in Genes Associated with Hereditary Diseases

Filipovic-Sadic, S.; Parker, C. A.; Mihailovic, M. K.; Milligan, J. N.; Turner, J. M.; Borel, S. L.; Le, V.; Markulin, T.; Janovsky, J. W.; Killinger, B. J.; Deshotel, M. J.; Reading, N. S.; Fredrickson, E. K.; Ji, Y.; Close, D.; Wright, J.; Williams, M.; Barrie, E. S.; Martin, K. E.; Gray, S. M.; Haynes, B. C.; Hall, B.

2026-05-19 genetics 10.64898/2026.05.14.725224 medRxiv

Top 0.3%

6.4%

Show abstract

PurposeCarrier screening for hereditary conditions is challenged by genes with complex genomic architecture, where short-read sequencing can fail to detect clinically relevant variants. This study evaluated a unified, amplification-based nanopore sequencing workflow across multiple laboratories for comprehensive analysis of such loci. MethodsA modular long-read sequencing assay was evaluated across five laboratories using targeted PCR enrichment, Oxford Nanopore sequencing, and automated variant analysis. The workflow interrogated genes associated with spinal muscular atrophy, thalassemia, cystic fibrosis, fragile X syndrome, congenital adrenal hyperplasia, Gaucher disease, and hemophilia A. Performance was assessed against orthogonal methods for single nucleotide variants (SNVs), indels, copy-number variants, repeat expansions, and structural rearrangements. ResultsAcross 882 unique samples (1,266 tests), overall agreement with comparator methods exceeded 96% for variant-level detection and 97% for genotype status classification. Long-read sequencing enabled phasing of paralogous loci, integrated sizing and interruption analysis for FMR1 repeats, and simultaneous detection of SNVs and structural variants in globin loci and CYP21A2-TNXB region, reducing reliance on multiple workflows. ConclusionThis multisite evaluation suggests that targeted long-read sequencing can consolidate complex variant detection into a single workflow, improving analytical completeness and operational efficiency for carrier screening.

9

Stable yet Shifting: Early Toxin Dynamics in Typical and Atypical Clownfish-Anemone Symbioses

Macrander, J.; Bennett, A.; Statile, K.; Rudd, W.; Tolman, C.; Kuklina, S.; Burg, S.; Whitton, L.; Langford, G.

2026-05-29 genetics 10.64898/2026.05.26.727870 medRxiv

Top 0.3%

6.3%

Show abstract

Among venomous animals, cnidarians represent the oldest metazoan lineage in which venom production and a specialized delivery system are defining synapomorphies. Cnidarians also represent the only venomous lineage for which mutualistic symbioses have evolved resulting in scenarios where mutualistic symbionts may also be targets of their venom. The most iconic example of this relationship is the mutualism between clownfish and their venomous sea anemone hosts. To investigate how symbiont presence and establishment influence toxin gene expression, we used a comparative TagSeq and RNA-Seq approach to quantify venom gene dynamics during the first 48 hours of clownfish-anemone symbiosis establishment in five anemone species. Our taxonomic sampling included three typical hosting species (Entacmaea quadricolor, Radianthus crispa, and Stichodactyla haddoni), each representing distinct evolutionary lineages of clownfish hosts, and two atypical Caribbean species (Condylactis gigantea and Stichodactyla helianthus) that do not host clownfish in nature, but have reported to host within the aquarium trade. Tentacle samples were collected prior to hosting, approximately 12 hours after initial symbiont establishment, and again 48 hours after symbiosis establishment. Our analyses revealed that overall toxin assemblages remained relatively stable during the early establishment phase, with no significant changes in the most highly expressed toxin gene candidates. However, subtle transcript-level shifts occurred within multi-copy toxin gene families, including cytolytic actinoporins and Sea Anemone 8 (SA8)-like toxins. Notably, one C. gigantea actinoporin transcript exhibited a [~]600-fold increase in expression in a single individual, which coincided with two clownfish mortalities prior to successful association, which subsequently decreased after establishment. Comparative sequence alignments suggest that amino acid substitutions in this transcript may be functionally relevant to symbiosis intolerance, as the amino acid substitutions were unique to this transcript, and not found in any other previously described cytolytic actinoporin. Together, these findings reveal that early toxin gene expression in clownfish-hosting sea anemones is largely stable, yet subtly dynamic at the transcript level. This study provides the first comparative transcriptomic insights into the molecular processes shaping symbiosis establishment in clownfish-anemone mutualisms, offering a framework for understanding venom evolution in the context of co-evolutionary interactions. HighlightsO_LIComparative gene expression survey reveals relatively stable toxin assemblages throughout the first 48 hours of establishing clownfish-anemone symbiosis. C_LIO_LISubtle shifts were observed among transcript variants in multi-gene copy variants, with potential implications for barriers to establishing symbiosis. C_LIO_LIAlthough toxin assemblages varied among species, sea anemone 8 (SA8) toxin-like transcripts were highly abundant in four of the focal taxa. C_LIO_LIThis is the first comparative gene expression analysis investigating molecular processes surrounding symbiosis establishment between clownfish and sea anemones. C_LIO_LIThese results provide insight into toxin dynamics surrounding the establishment of symbiosis, with particular insights into key evolutionary transitions resulting in symbiosis among atypical clownfish hosting species. C_LI

10

A new method based on genome alignments provides a highly resolutive target enrichment set for weevils (Coleoptera, Curculionoidea)

ZELVELDER, B.; BENOIT, L.; LOISEAU, A.; HARAN, J.; ALLIO, R.

2026-05-13 evolutionary biology 10.64898/2026.05.09.724036 medRxiv

Top 0.3%

6.3%

Show abstract

Target enrichment methods have provided unprecedented advances in phylogenomics. Targeting hundreds of conserved regions has proven to be a good tradeoff between cost and efficiency, while being useful for museomics and diversified non-model clades. Unfortunately, current methods used for identifying such regions involve high degrees of conservation within targeted elements, usually pushing researchers to rely on flanking data with little guarantee for homology. With a growing number of high quality genomes available throughout the Tree of Life emerges new opportunities to improve marker selection. In this study, we introduce GABBI, a new method for designing target capture probes by taking advantage of genome alignments, avoiding the selection of a single reference genome that can cause notable biases. We compare GABBI-derived markers to the most commonly used probe design method, PHYLUCE, at two taxonomic scales, the weevil superfamily Curculionoidea and the tribe Pachyrhynchini. At both taxonomic scales, results show that our new method allows identifying more variable loci that prove to be more phylogenetically resolutive than the PHYLUCE-derived ones. Doing so, we provide the first probe set specifically designed for weevils, targeting a wide set of 4,255 shared homologous regions, encouraging future research on systematics and macroevolution of one of the most diverse and economically important groups of insects. By providing GABBI as an automated and open-access pipeline, we hope to open new probe design opportunities to other taxonomic groups that face similar phylogenetic obstacles.

11

The impact of long-read sequencing on fungal genome assemblies: progress and disparity

Kroll, E.; Zoclanclounon, Y. A. B.; Urban, M.; Hill, R.; Hammond-Kosack, K. E.

2026-05-14 genomics 10.64898/2026.05.12.724544 medRxiv

Top 0.3%

6.2%

Show abstract

Fungal genomics has expanded rapidly over the past 30 years, and recently the pace and breath has further quickened for many taxa, although many taxonomic gaps persist. With three decades of rapid growth, fungal genomics now merits a re-examination of its history, progress, and unresolved taxonomic gaps. Here, we review the development of fungal genomics from early efforts such as the Fungal Genome Initiative to current progress driven by third-generation long-read sequencing. We have compiled and summarised publicly available fungal genomes to highlight trends in assembly quality, adoption of long-read technologies, and taxonomic representation. Notably, substantial phylogenetic gaps remain, particularly outside Dikarya, and significant challenges persist for unculturable taxa. This review identifies priorities for the fungal community, including: (1) coordinated efforts to close major taxonomic gaps across the fungal tree of life; (2) improved repository metrics to facilitate identification of high-quality assemblies; and (3) improved and standardised genome annotation which is lacking for most assemblies. Together, these steps will support the development of reliable genomic resources that capture the full breadth of diversity across the fungal kingdom, generating foundational data for comparative genomics, evolutionary biology, functional studies, genetic studies and applied research.

12

Antimicrobial use contributes to resistance gene enrichment across cattle groups on commercial dairy farms

Steinberger, A. J.; Nickodem, C. A.; Leite de Campos, J.; Kates, A. E.; Goldberg, T. L.; Safdar, N.; Sethi, A. K.; Shutske, J. M.; Ruegg, P. L.; Suen, G.; Hite, J. L.

2026-05-26 genomics 10.64898/2026.05.22.726633 medRxiv

Top 0.4%

5.0%

Show abstract

Antimicrobial use (AMU) in agricultural systems is frequently linked to antimicrobial resistance (AMR). Yet, the scale at which AMU reshapes host-associated resistomes remains unclear. This gap arises, in part, from the scarcity of farm-level AMU data from commercial production systems. Here, we combine detailed AMU records from commercial dairy farms with metagenomic analyses of bovine fecal resistomes from calves, lactating cows, sick cows, and cull cows. At a broad level, resistome profiles were similar regardless of farm AMU. Resistance associated with historically common antibiotics, such as tetracyclines, was frequent on low- and high-AMU farms, indicating that some resistance classes are ubiquitous in dairy systems regardless of current AMU. In contrast, resistance to other drug classes varied systematically with AMU. Higher AMU was associated with increased resistance to aminoglycosides, {beta}-lactams, and macrolides, drug classes that are critical for treating mastitis and bovine respiratory disease. Resistance gene richness and diversity were highest in calves, underscoring the importance of accounting for host traits alongside AMU when evaluating resistance patterns. Together, these findings underscore the need for detailed, farm-level AMU data to understand how management practices shape AMR and to inform strategies for sustaining the effectiveness of existing antimicrobials in agricultural and public-health contexts.

13

Verification of human nucleotide sequence reagents and cell line identities in original circRNA articles published in high impact factor journals

Pathmendra, P.; Enguita, F. J.; Byrne, J. A.

2026-05-29 genomics 10.64898/2026.05.28.728608 medRxiv

Top 0.4%

4.9%

Show abstract

Numbers of research articles studying circRNAs have increased rapidly since 2017. Previous analyses of human circRNA articles in two high impact factor cancer research journals identified papers with wrongly identified nucleotide sequence reagents and circRNAs whose identities could not be independently verified. In the present study, verification of human nucleotide sequence reagent and cell line identities in retracted circRNA articles published from 2017-2021 in high impact factor journals found wrongly identified nucleotide sequences and/or cell lines in all 13 retracted papers. Similar analyses of human circRNA papers published in high impact factor journals in 2022 found wrongly identified, non-verifiable and/or questionable reagents in 71% (84/118) papers, where 51% (60/118) papers described at least one wrongly identified reagent. When individual error types and features of concern were considered, 2022 circRNA papers described wrongly identified nucleotide sequence reagents (52/118, 44%), questionable circRNA probes that did not meet accepted targeting requirements (34/118, 29%), non-verifiable nucleotide sequences (25/118, 21%), wrongly identified cell lines (22/118, 19%), and/or non-verifiable cell line identifiers (6/118, 5%). In summary, wrongly identified, non-verifiable and/or questionable reagents were unexpectedly frequent in human circRNA papers in high impact journals, highlighting the need for critical engagement with the circRNA literature.

14

In silico restriction site analysis of whole genome sequences shows patterns caused by selection and sequence duplications

Vedder, L.; Schoof, H.

2026-05-16 genomics 10.64898/2026.05.15.725336 medRxiv

Top 0.4%

4.9%

Show abstract

Biological sequences are known to be not random. Thus, the comparison of in silico restriction fragment distributions of random and biological sequences may be an indicator of this non-randomness. Our analyses show that for most of the tested combinations of restriction enzyme and genome sequence the fragments per Megabase of the biological sequence deviate at least more then 10% from the corresponding random sequence. This deviation goes into both directions, i.e. clearly increased values are as common as clearly decreased values. Although there is no species- or restriction-enzyme-specific effect, a clear impact of the GC content both of the restriction site and of the genome sequence can be seen. In contrast to the random sequences, the genome sequences show distinct peaks in their fragment length distributions, hinting to repetitive elements such as transposons.

15

Efficient and Robust Genomic DNA Isolation and Next-Generation Sequencing Library Preparation from Recalcitrant Wild Grape Species

Bhattarai, A.; Smith, J.; Abdelgaffar, H.; Carpenter, R.; Mishra, S.; Fuentes, J. L. J.; Shirsekar, G.

2026-05-21 genomics 10.64898/2026.05.19.713680 medRxiv

Top 0.4%

4.9%

Show abstract

This protocol details the extraction of high-molecular-weight genomic DNA from grapevine tissues (wild and cultivated Vitis spp., including pathogen-infected samples) and the subsequent preparation of Illumina(R) whole-genome sequencing libraries using bead-bound Tn5 transposase. It is designed to overcome challenges from polyphenolic compounds and secondary metabolites in wild plants, providing a cost-effective workflow for large-scale population genomics. It includes recipes for buffers, incubation times, critical notes, and troubleshooting tips to maximize yield and library quality. Although designed for the grapevine DNA, this protocol is potentially applicable to other similar wild plant species HighlightsO_LIOptimized CTAB-PTB DNA extraction protocol for field-collected wild plant tissues. C_LIO_LIEffective removal of polyphenols and secondary metabolites associated with DNA using PTB. C_LIO_LICost-effective Illumina DNA Prep library preparation using bead-bound Tn5 transposase (Tagmentation). C_LIO_LIScalable workflow suitable for large-scale population genomics in Vitis species. C_LIO_LIValidated method for high-molecular-weight DNA and high-quality sequencing data. C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=195 SRC="FIGDIR/small/713680v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@b637d4org.highwire.dtl.DTLVardef@10b563aorg.highwire.dtl.DTLVardef@14a32caorg.highwire.dtl.DTLVardef@4c9577_HPS_FORMAT_FIGEXP M_FIG C_FIG

16

PCR-free, targeted genomic sequencing using Dynamically optimized reference Adaptive Sampling (DORAS)

Borcard, L.; Gempeler, S.; Terrazos Miani, M. A.; Casanova, C.; Ramette, A.

2026-05-29 genomics 10.64898/2026.05.26.727915 medRxiv

Top 0.4%

4.8%

Show abstract

Whole genome sequencing (WGS) has become a cornerstone of clinical microbiology, enabling comprehensive analysis of microbial genome diversity. However, WGS is often computationally intensive and time-consuming when applied to specific applications like multilocus sequence typing (MLST), where only a subset of genes is only needed for typing. This study evaluates the potential of adaptive sampling (AS), a software-based solution available on Oxford Nanopore Technologies (ONT) devices, to optimize sequencing runs for MLST by reducing the production of unnecessary reads falling outside of the target areas. We demonstrate that AS, when used directly with the target gene sequences, does not reach sufficient target coverage when compared to WGS baseline sequencing due to inefficient read recruitment. Thus, we developed a novel, PCR-free approach, termed Dynamically Optimized Reference Adaptive Sampling (DORAS), which streamlines gene-specific enrichment by targeting genomic regions of interest and their genomic vicinity. DORAS first determines the genomic context of regions of interest for each sample, and then dynamically adjusts the length of the reference sequences during live sequencing. Consensus sequences are periodically constructed and evaluated for taxonomic classification. We demonstrate that full MLST profiles can be obtained in approximately half the time required for whole-genome sequencing to achieve 30X coverage (3 vs. 6 h), with no additional hands-on library preparation time. Validation on clinical isolates from hospital outbreaks belonging to Corynebacterium diphtheriae, vancomycin-resistant Enterococci, and routine clinical E. coli isolates, demonstrated the consistent retrieval of MLST types as compared to standard WGS methods. DORAS thus offers a cost-effective, efficient solution for routine surveillance and outbreak investigations based on MLST types in the clinical setting.

17

Library preparation strategy critically impacts RNA virus sensitivity in clinical metagenomics

Stepniak, D.; Constantinides, B.; Weaver, M.; Treagus, S.; Wilkinson, S. A.; Quarton, S.; Behruznia, M.; Cumley, N.; Tyson, J.; McNally, A.; Loman, N. J.; Pullan, S.; Quick, J.

2026-05-21 genetic and genomic medicine 10.64898/2026.05.18.26353500 medRxiv

Top 0.4%

4.8%

Show abstract

Clinical metagenomics uses sequencing for culture-independent identification of pathogens directly from clinical specimens. While a number of protocols claim to be pathogen agnostic, sensitivity for RNA viruses is likely lower than for bacteria or fungi, as it requires additional processing steps including conversion to cDNA. Sequence-independent, single-primer amplification (SISPA) was first described in 1991, yet how it preferentially enriches viral molecules has never been described. Here we propose that single-primer amplification exploits the PCR suppression effect, which selectively amplifies longer viral molecules over shorter host-derived cDNA fragments on the basis of size. This model predicts that any upstream processing step that disrupts fragment length will prevent this enrichment occurring. To test this, we systematically compared two adapter introduction strategies - during cDNA synthesis and via tagmentation - followed by single primer amplification, using the ZeptoMetrix Respiratory Panel 2.1 containing 16 RNA and 3 DNA virus strains. SISPA-based approaches recovered all of the viral genomes in the control, whereas using tagmentation to amplify cDNA recovered none. We then spiked the controls into extracted clinical samples and found that SISPA-based methods performed best in all background settings, however in high-background settings no viral genomes were recovered by any approach. Finally, using a modified SMART-9N protocol, we demonstrated that single-primer PCR is critical to overall performance, indicating that direct tagmentation of cDNA and dual-primer PCR should be avoided in protocols for clinical metagenomics where high sensitivity for RNA viruses is critical. These findings demonstrate that library preparation strategy fundamentally determines RNA virus sensitivity and offer mechanistic insights for protocol optimisation with direct relevance to clinical metagenomics.

18

Circular RNA-associated QTLs show stronger association with splicing-QTLs than with expression-QTLs

Zabala, A.; Ascension, A. M.; Iniguez, S. G.; Iparraguirre, L.; Andres-Leon, E.; Matesanz, F.; Otaegui, D.; Munoz-Culla, M.

2026-05-29 genetics 10.64898/2026.05.29.728707 medRxiv

Top 0.5%

4.7%

Show abstract

IntroductionCircular RNA quantitative trait loci (circQTLs) have emerged as a class of regulatory variants, but their mechanistic basis remains poorly characterized. Understanding how genetic variation influences circRNA biogenesis is essential to clarify their role in post-transcriptional gene regulation. MethodsWe systematically compared circQTLs with matched splicing (sQTL) and expression (eQTL) datasets. Using bootstrap-based Jaccard similarity analyses, we quantified genomic overlap patterns and assessed their statistical significance. We further validated these findings across independent circQTL studies. In addition, we analyzed the genomic distribution of circQTLs to identify enrichment patterns across functional genomic regions. ResultscircQTLs exhibited a statistically significant but modestly stronger genomic overlap with sQTLs compared to eQTLs. This pattern was consistent across independent datasets despite limited reproducibility of individual circQTL signals. Genomic annotation revealed distinct distributional patterns, including depletion in exonic regions and relative enrichment in non-coding genomic contexts compared to other QTL classes. DiscussionTogether, these results suggest that circRNA-associated regulatory variation is preferentially linked to splicing-related mechanisms rather than transcriptional control of host genes. However, the modest effect size indicates that this relationship is not exclusive, and likely reflects a mixture of shared splice-site regulatory effects and additional mechanisms specific to back-splicing that are not captured by conventional sQTL or eQTL frameworks. This dual architecture positions circRNA biogenesis at the interface between splicing dynamics, RNA structure, and higher-order genomic organization, supporting circQTLs as a distinct layer of post-transcriptional gene regulation.

19

Identification of genes important for response of Pseudomonas aeruginosa biofilms to ciprofloxacin exposure

Wang, M.; Holden, E. R.; Yasir, M. R.; Bastkowski, S.; Turner, K.; Sims, L. P.; Gilmour, M. W.; Charles, I. G. W.; Webber, M. A.

2026-05-29 genomics 10.64898/2026.05.27.728104 medRxiv

Top 0.5%

4.3%

Show abstract

Pseudomonas aeruginosa is an opportunistic pathogen that can cause severe infections in immunocompromised individuals, such as patients with cystic fibrosis where it commonly forms biofilms. Ciprofloxacin is used extensively to treat P. aeruginosa infections, but its effectiveness can be significantly reduced due to biofilm formation. Although many individual genes associated with biofilm formation or ciprofloxacin resistance have been characterised, the genetic basis of P. aeruginosa biofilm fitness related to antibiotic challenge remains incompletely understood. In this study we employed a whole genome screen to assay the impact of gene disruptions or altered gene expression on survival of P. aeruginosa biofilms exposed to different concentrations of ciprofloxacin. Genes impacting fitness in the biofilm context were identified by comparing the biofilm samples to planktonic samples harvested at 12h, 24h and 48h with and without ciprofloxacin. Genes associated with c-di-GMP regulation and Gac/Rsm signalling were identified as primary regulators for biofilm formation in the presence and absence of ciprofloxacin. In addition, a group of genes involved in respiration, metabolism (especially polyamine metabolism), and various transporter and efflux systems were identified as important for biofilm fitness. Ciprofloxacin specifically imposed a selective pressure on flagellar function and Psl production which were essential for survival in early biofilms. Moreover, transposon insertions within the CPA gene clusters (PA5448-PA5451 and PA5455-PA5456) and the salvage peptidoglycan recycling pathway showed reduced fitness in late biofilms at high concentration of ciprofloxacin, indicating that cell envelope integrity is beneficial for mature biofilms. This study identifies important determinants of survival for biofilms at different stages of maturity in the presence and absence of ciprofloxacin and implicates potential therapeutic targets for antibiofilm drug development.

20

Transcriptomic profiling of embryo-derived cell lines from the Chagas disease insect vector Rhodnius prolixus

de Andrade Tavares, L.; Garcia, A. C.; Bell-Sakyi, L.; Fontenele de Brito, T.; Pane, A.

2026-05-12 genetics 10.64898/2026.05.08.723764 medRxiv

Top 0.5%

4.3%

Show abstract

Rhodnius prolixus is a primary insect vector of Trypanosoma cruzi, the causative agent of Chagas disease, a neglected parasitosis endemic in Latin American countries. It has been estimated that Chagas disease affects 7-8 million people worldwide and is responsible for approximately 1000 deaths per year. Genetic and molecular studies in this species remain challenging due to its life cycle and feeding habits, thus hindering the development of new strategies to control their populations and reduce the diffusion of Chagas disease. Recently, two stable cell lines - RPE/LULS53 and RPE/LULS57 - were derived from Rhodnius embryos, which represent promising new tools to investigate the genetics of this insect vector. Here, we describe their gene expression landscapes through transcriptomic approaches. We show that 8,968 expressed genes are shared between the two cell lines, whereas 391 and 1,088 genes are uniquely expressed in RPE/LULS53 and RPE/LULS57, respectively. Although key components of primary developmental, immune and redox signaling pathways are expressed in both cell lines, some genes such as Frizzled-10-a-like and catalase show marked differences in expression. Our results strongly suggest that RPE/LULS53 and RPE/LULS57 likely represent two different cell phenotypes. Consistent with this, gene ontology analysis reveals that RPE/LULS53 is enriched for animal organ morphogenesis and stress response, while RPE/LULS57 for DNA-directed RNA polymerase activity, among others. Despite these differences, both cell lines express comparable levels of transcripts from resident transposable elements, including the highly abundant Mariner and LINE/I elements, as well as horizontally transferred transposons. Our findings shed light on the nature of the RPE/LULS53 and RPE/LULS57 embryo-derived cell lines and provide valuable transcriptomic resources for future genetic and functional studies in Rhodnius and other triatomine insect vectors. Author summaryRhodnius prolixus is a blood-feeding insect and a major vector of Chagas disease, a parasitosis endemic in Latin America and affecting millions of people worldwide. In the absence of effective drugs and vaccines, the control of the insect population represents a promising strategy to reduce the diffusion of the disease. Yet, genetic and functional studies in Rhodnius are extremely challenging due to its feeding habit and life cycle. To overcome these limitations, researchers have previously developed two stable cell lines derived from Rhodnius embryos. In this study, we provide the first characterization of the genes expressed in these cell lines. We found that, while the two cell lines share many expressed genes, each of them also has distinct gene expression patterns pointing to two different cell types with specialized functions. These differences likely affect the way they respond to stress and regulate biological processes. Our findings provide an important resource for researchers studying Rhodnius prolixus and other insect vectors, helping advance our understanding of the genetic and molecular mechanisms that control the insect development and mediate the interactions between insect vectors and the parasites they transmit